T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text Classification

نویسندگان

چکیده

Abstract Cross-lingual text classification leverages classifiers trained in a high-resource language to perform other languages with no or minimal fine-tuning (zero/ few-shots cross-lingual transfer). Nowadays, are typically built on large-scale, multilingual models (LMs) pretrained variety of interest. However, the performance these varies significantly across and tasks, suggesting that superposition modelling tasks is not always effective. For this reason, paper we propose revisiting classic “translate-and-test” pipeline neatly separate translation stages. The proposed approach couples 1) neural machine translator translating from targeted language, 2) classifier but generates “soft” translations permit end-to-end backpropagation during pipeline. Extensive experiments have been carried out over three datasets (XNLI, MLDoc, MultiEURLEX), results showing has improved competitive baseline.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-Supervised Representation Learning for Cross-Lingual Text Classification

Cross-lingual adaptation aims to learn a prediction model in a label-scarce target language by exploiting labeled data from a labelrich source language. An effective crosslingual adaptation system can substantially reduce the manual annotation effort required in many natural language processing tasks. In this paper, we propose a new cross-lingual adaptation approach for document classification ...

متن کامل

Cross-lingual Distillation for Text Classification

Cross-lingual text classification(CLTC) is the task of classifying documents written in different languages into the same taxonomy of categories. This paper presents a novel approach to CLTC that builds on model distillation, which adapts and extends a framework originally proposed for model compression. Using soft probabilistic predictions for the documents in a label-rich language as the (ind...

متن کامل

Transfer learning for text classification

Linear text classification algorithms work by computing an inner product between a test document vector and a parameter vector. In many such algorithms, including naive Bayes and most TFIDF variants, the parameters are determined by some simple, closed-form, function of training set statistics; we call this mapping mapping from statistics to parameters, the parameter function. Much research in ...

متن کامل

Active Learning for Cross-Lingual Sentiment Classification

Cross-lingual sentiment classification aims to predict the sentiment orientation of a text in a language (named as the target language) with the help of the resources from another language (named as the source language). However, current cross-lingual performance is normally far away from satisfaction due to the huge difference in linguistic expression and social culture. In this paper, we sugg...

متن کامل

Semi-Supervised Matrix Completion for Cross-Lingual Text Classification

Cross-lingual text classification is the task of assigning labels to observed documents in a label-scarce target language domain by using a prediction model trained with labeled documents from a label-rich source language domain. Cross-lingual text classification is popularly studied in natural language processing area to reduce the expensive manual annotation effort required in the target lang...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Transactions of the Association for Computational Linguistics

سال: 2023

ISSN: ['2307-387X']

DOI: https://doi.org/10.1162/tacl_a_00593